The Story:

  • A bit of a chicken and an egg tale: There are three hypotheses that look at a full circle story from settlement to adulthood
  • Hypothesis 1a: Does percent cover of adult corals drive the total number of settlers on a tile? (spoiler alert: yes)
  • Hypothesis 1b: Does silicate drive the total number of settlers on a tile? (spoiler alert: also yes, but slighly less than cover)
  • Hypothesis 2a: Once settled, what factors drive survivorship? Density dependence! (spoiler alert: duh!)
  • Hypothesis 2b: Beyond density dependence, does SGD impact the survivorship (spoiler alert: no)
  • Hypothesis 3: Of the settlers that survive, does the number impact the percent cover into adulthood? (spoiler alert: sort of, it’s complicated)

The Models:

Although this story looks as if it should be a perfect fit for an SEM, there are a few roadblocks: - percent cover and SGD are highly correlated/covaried. Need to figure out how to deal with that in an SEM because SEM does not accept separate models with the same response variable, so SGD and percent cover would need to be together, but then results are not accurate depiction of the situation.

  • The zeros are removed from Hyp 3 because only a few indivdual recruits on the plates and they were skewing the data

  • Considering Varari and Cabral in separate models; however, maybe combine all into one model and use Location as an interaction term

Load libraries

NOTE: needed to download and install an old version of “heavy” package because it was removed from the CRAN repository in 2022

Read in Data and Tidy

  • Includes some background info like range of settler counts between sites and mean number of settlers
## [1] 1
## [1] 146
## [1] 21.75
## [1] 0
## [1] 1
## [1] 0.5144263
## [1] 2
## [1] 131
## [1] 33.8
## [1] 0
## [1] 1
## [1] 0.3693079

Hypothesis 1 (a and b):

  • Using a generalized linear model, negative binomial

  • Results: There is a site difference for the effect of silicate, which makes sense because the two sites are so different in their dynamics (silicate concentration, flow, etc). However, there is no effect of site for percent cover. Percent cover significantly affects the number of total recruits that settle on the tiles; however silicate only affects the total number of settlers at Varari (linearly and non-linearly), but not at Cabral. According to the AIC, the silicate model is a lower rank, so percent cover is a better/stronger driver of total settlement, but silicate is still important. Looking at the standardized effect sizes, the power of the relationship is strong with ….

  • Why separate it into 2 models? Because too many variables at play, and silicate & percent cover are too highly correlated

#######################
## decision is to keep both sites together and use "Location" as an interaction term in the model 
#######################

############
#### Hypothesis 1a - percent cover 
PocCover_negbinom <- glm.nb((sum_total+1) ~ log(percent_cover+1)*Location, data=SettlementDF_Full)

summary(PocCover_negbinom) ## RESULT: log percent cover is significant, no location interaction or just location effect 
## 
## Call:
## glm.nb(formula = (sum_total + 1) ~ log(percent_cover + 1) * Location, 
##     data = SettlementDF_Full, init.theta = 1.109358935, link = log)
## 
## Coefficients:
##                                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                             2.1524     0.3984   5.402 6.58e-08 ***
## log(percent_cover + 1)                  0.7256     0.2827   2.566   0.0103 *  
## LocationVarari                          0.8417     0.5169   1.628   0.1035    
## log(percent_cover + 1):LocationVarari  -0.3824     0.3381  -1.131   0.2581    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.1094) family taken to be 1)
## 
##     Null deviance: 56.158  on 39  degrees of freedom
## Residual deviance: 44.170  on 36  degrees of freedom
## AIC: 349.05
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.109 
##           Std. Err.:  0.236 
## 
##  2 x log-likelihood:  -339.049
#r.squaredGLMM(PocCover_negbinom)
## AIC = 345.31, p-value = 0.01 

## try without logging percent cover 
PocCover_negbinom2 <- glm.nb(sum_total ~ percent_cover*Location, data=SettlementDF_Full)

summary(PocCover_negbinom2) ## RESULT: slightly less significant, but same results 
## 
## Call:
## glm.nb(formula = sum_total ~ percent_cover * Location, data = SettlementDF_Full, 
##     init.theta = 0.9627769765, link = log)
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   2.39356    0.34385   6.961 3.38e-12 ***
## percent_cover                 0.18364    0.07744   2.371   0.0177 *  
## LocationVarari                0.57558    0.45594   1.262   0.2068    
## percent_cover:LocationVarari -0.11861    0.08345  -1.421   0.1552    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.9628) family taken to be 1)
## 
##     Null deviance: 56.298  on 39  degrees of freedom
## Residual deviance: 44.715  on 36  degrees of freedom
## AIC: 345.31
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.963 
##           Std. Err.:  0.204 
## 
##  2 x log-likelihood:  -335.309
## AIC = 345.31, p-value = 0.01 
check_model(PocCover_negbinom2)

############
#### Hypothesis 1b - SGD 
SGD_negbinom <- glm.nb(sum_total ~ (silicate_avg + I(silicate_avg^2))*Location, data=SettlementDF_Full)

summary(SGD_negbinom) ## RESULT: silicate at Varari is significant both linear and nonlinear 
## 
## Call:
## glm.nb(formula = sum_total ~ (silicate_avg + I(silicate_avg^2)) * 
##     Location, data = SettlementDF_Full, init.theta = 0.9122803951, 
##     link = log)
## 
## Coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                       1.72899    2.37335   0.729   0.4663  
## silicate_avg                      0.62979    1.51219   0.416   0.6771  
## I(silicate_avg^2)                -0.06483    0.21840  -0.297   0.7666  
## LocationVarari                   -5.10510    3.85202  -1.325   0.1851  
## silicate_avg:LocationVarari       7.17745    3.63825   1.973   0.0485 *
## I(silicate_avg^2):LocationVarari -1.94437    0.85219  -2.282   0.0225 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.9123) family taken to be 1)
## 
##     Null deviance: 53.537  on 39  degrees of freedom
## Residual deviance: 45.038  on 34  degrees of freedom
## AIC: 351.87
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.912 
##           Std. Err.:  0.191 
## 
##  2 x log-likelihood:  -337.872
## AIC = 351.39 
check_model(SGD_negbinom) ## obviously very collinear bc linear and nonlinear 

#######################
## plot the negative binomial 
#######################










#######################
## standardized effect sizes  
#######################
# Note: won't work on dependent var bc changes count data to continuous which then doesn't use glm.nb()
## only calculate sef for independent vars of SGD, SGD^2, and percent cover
## and take sef before the model, create new df 

############
## get z-scores of just independent vars before going into the model 
## You can get the zscore by using scale(x, center=TRUE, scale=TRUE)
## but need to make new df that has "parameter" which is the silicate or cover  and "zscores" which is the standardized score 


#percentcover_sef <- tibble("percentcover_zscore" = scale(SettlementDF_Full_C$percent_cover, center = TRUE, scale = TRUE))


#SettlementDF_Full_C$z_percent_cover <- scale(SettlementDF_Full_C$percent_cover, center = TRUE, scale = TRUE)
#SettlementDF_Full_C$z_SGD <- scale(SettlementDF_Full_C$silicate_avg, center = TRUE, scale = TRUE)
#SettlementDF_Full_C$z_SGD2 <- scale(SettlementDF_Full_C$silicate_avg^2, center = TRUE, scale = TRUE)

#SettlementDF_Full_V$z_percent_cover <- scale(SettlementDF_Full_V$percent_cover, center = TRUE, scale = TRUE)
#SettlementDF_Full_V$z_SGD <- scale(SettlementDF_Full_V$silicate_avg, center = TRUE, scale = TRUE)
#SettlementDF_Full_V$z_SGD2 <- scale(SettlementDF_Full_V$silicate_avg^2, center = TRUE, scale = TRUE)

Hypothesis 2a:

  • Using a general linear model (gaussian lm)
  • Results: Significant with or without the 0s dropped for the 4 tiles that had few recruits and 100% mortality. For time being, keep the 0s in the model if significant. It is clear the density dependence is happening and driving the survivorship of the corals that settle on the tiles. Location is once again insignificant, as expected
### model the proportion of settlers that are alive out of the total on the plates 

proportion_alive <- lm(proportion_alive ~ log(sum_total+1)*Location, data=SettlementDF_Full)

anova(proportion_alive) ## barely significant, but still significant, Location is not sig (as expected)
## Analysis of Variance Table
## 
## Response: proportion_alive
##                             Df  Sum Sq  Mean Sq F value  Pr(>F)  
## log(sum_total + 1)           1 0.28120 0.281202  4.4035 0.04294 *
## Location                     1 0.12179 0.121793  1.9072 0.17578  
## log(sum_total + 1):Location  1 0.08614 0.086137  1.3489 0.25312  
## Residuals                   36 2.29892 0.063859                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(proportion_alive) ## report R squared too in results 
## 
## Call:
## lm(formula = proportion_alive ~ log(sum_total + 1) * Location, 
##     data = SettlementDF_Full)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56230 -0.14862  0.02246  0.13512  0.59074 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.76053    0.13328   5.706 1.72e-06 ***
## log(sum_total + 1)                -0.10187    0.04996  -2.039   0.0489 *  
## LocationVarari                    -0.32753    0.20191  -1.622   0.1135    
## log(sum_total + 1):LocationVarari  0.08026    0.06911   1.161   0.2531    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2527 on 36 degrees of freedom
## Multiple R-squared:  0.1754, Adjusted R-squared:  0.1067 
## F-statistic: 2.553 on 3 and 36 DF,  p-value: 0.07069
check_model(proportion_alive)

### remove the 0s from both Varari and Cabral because skewing the data - 0s meaning the plates that had settlers, but none survived -- there are only a handful of recruits on those plates, so probably weren't even supposed to be there 
### for all of the plates with 0, 6 or less settlers 

proportion_alive_noZero <- lm(proportion_alive ~ log(sum_total+1)*Location, data=SettlementDF_Full %>%
                                filter(proportion_alive>0))

anova(proportion_alive_noZero) ## super significant, Location is not sig (as expected)
## Analysis of Variance Table
## 
## Response: proportion_alive
##                             Df  Sum Sq Mean Sq F value    Pr(>F)    
## log(sum_total + 1)           1 0.84218 0.84218 25.4635 1.738e-05 ***
## Location                     1 0.00003 0.00003  0.0009    0.9765    
## log(sum_total + 1):Location  1 0.01971 0.01971  0.5959    0.4458    
## Residuals                   32 1.05837 0.03307                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(proportion_alive_noZero)
## 
## Call:
## lm(formula = proportion_alive ~ log(sum_total + 1) * Location, 
##     data = SettlementDF_Full %>% filter(proportion_alive > 0))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.309706 -0.089731 -0.006428  0.095317  0.307061 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.81691    0.09753   8.376 1.43e-09 ***
## log(sum_total + 1)                -0.11284    0.03612  -3.124  0.00378 ** 
## LocationVarari                     0.12431    0.17584   0.707  0.48473    
## log(sum_total + 1):LocationVarari -0.04335    0.05616  -0.772  0.44582    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1819 on 32 degrees of freedom
## Multiple R-squared:  0.4488, Adjusted R-squared:  0.3972 
## F-statistic: 8.687 on 3 and 32 DF,  p-value: 0.0002322
check_model(proportion_alive_noZero) ## high collinearity ** 

######################
## plot
######################

### with 0s 
### don't worry about faceting by location because we are keeping both sites together in the model 
densitydep_plot <- SettlementDF_Full %>% 
  ggplot(aes(x=log(sum_total+1), 
             y=proportion_alive)) +
  geom_point(size=4, aes(color=Location)) + 
  scale_color_manual(values=c("darkgoldenrod2", "firebrick4"), 
                    limits=c("Cabral", "Varari")) +
  geom_smooth(method="lm", color="black") + 
  theme_classic() + 
  labs(x= "Logged Total Settlers", 
       y= "Proportion of Settlers Alive") + 
  theme(axis.text.x=element_text(size=20), 
        axis.text.y=element_text(size=20), 
        axis.title.x=element_text(size=20), 
        axis.title.y=element_text(size=20), 
        legend.text=element_text(size=20),  # Adjust legend text size
        legend.title=element_text(size=20), # Adjust legend title size
        strip.text=element_text(size=20))

densitydep_plot
## `geom_smooth()` using formula = 'y ~ x'

ggsave(here::here("Outputs", "RecruitmentTiles", "Thesis_Figures", "DensityPlot.jpg"), 
       width=12, height=9)
## `geom_smooth()` using formula = 'y ~ x'
### without 0s 

densitydep_plot_noZeros <- SettlementDF_Full %>% 
  filter(proportion_alive>0) %>% 
   ggplot(aes(x=log(sum_total+1), 
             y=proportion_alive)) +
  geom_point(size=4, aes(color=Location)) + 
  scale_color_manual(values=c("darkgoldenrod2", "firebrick4"), 
                    limits=c("Cabral", "Varari")) +
  geom_smooth(method="lm", color="black") + 
  theme_classic() + 
  labs(x= "Logged Total Settlers", 
       y= "Proportion of Settlers Alive") + 
  theme(axis.text.x=element_text(size=20), 
        axis.text.y=element_text(size=20), 
        axis.title.x=element_text(size=20), 
        axis.title.y=element_text(size=20), 
        legend.text=element_text(size=20),  # Adjust legend text size
        legend.title=element_text(size=20), # Adjust legend title size
        strip.text=element_text(size=20))

densitydep_plot_noZeros
## `geom_smooth()` using formula = 'y ~ x'

ggsave(here::here("Outputs", "RecruitmentTiles", "Thesis_Figures", "DensityPlot_NoZero.jpg"), 
       width=10, height=9)
## `geom_smooth()` using formula = 'y ~ x'

Hypothesis 2a:

  • Using a generalized linear model (family, student’s t distribution OR quasi binomial?) – student’s t does not weight as heavily on outliers. Use instead of a normal distribution in gaussian model. Also works for smaller sample sizes.
  • Results: When trying quasibinomial, results are significant with log link (though, double check if that is correct because already loged sum_total in the model).
###########################
## student ts
###########################

############
## trying from stack overflow, old heavy package 
#install_url('http://cran.r-project.org/src/contrib/Archive/heavy/heavy_0.38.19.tar.gz') ## not working right now  

#fit <- heavyLm(m.marietta ~ CRSP, data = SettlementDF_Full, family = Student(df = 6))
#summary(fit)


#############
## trying with gaussian and identity link 
#studentt_version1 <- stan_glm(proportion_alive ~ log(sum_total+1)*Location, data = SettlementDF_Full, family = gaussian(link = "identity"))

#summary(studentt_version1) ## not sure how to interpret 



###########################
## quasibinomial 
###########################
#RESULT: neither of these with "logit" link are significant  
#BUT significant with "log" family. Without zero model did not converge. 

#### with zeroes 
quasibin_model <- glm(proportion_alive ~ log(sum_total+1)*Location, family = quasibinomial(link="log"), data=SettlementDF_Full)

summary(quasibin_model)
## 
## Call:
## glm(formula = proportion_alive ~ log(sum_total + 1) * Location, 
##     family = quasibinomial(link = "log"), data = SettlementDF_Full)
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                        0.007783   0.198677   0.039  0.96897   
## log(sum_total + 1)                -0.294217   0.103082  -2.854  0.00711 **
## LocationVarari                    -0.851114   0.437535  -1.945  0.05959 . 
## log(sum_total + 1):LocationVarari  0.241658   0.163563   1.477  0.14825   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.2676346)
## 
##     Null deviance: 14.483  on 39  degrees of freedom
## Residual deviance: 12.023  on 36  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 8
check_model(quasibin_model)

#### without zeroes 

quasibin_model_noZero <-  glm(proportion_alive ~ log(sum_total+1)*Location, family = quasibinomial(link="log"), data=SettlementDF_Full %>%
                                filter(proportion_alive>0))
## Warning: glm.fit: algorithm did not converge
summary(quasibin_model)
## 
## Call:
## glm(formula = proportion_alive ~ log(sum_total + 1) * Location, 
##     family = quasibinomial(link = "log"), data = SettlementDF_Full)
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                        0.007783   0.198677   0.039  0.96897   
## log(sum_total + 1)                -0.294217   0.103082  -2.854  0.00711 **
## LocationVarari                    -0.851114   0.437535  -1.945  0.05959 . 
## log(sum_total + 1):LocationVarari  0.241658   0.163563   1.477  0.14825   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.2676346)
## 
##     Null deviance: 14.483  on 39  degrees of freedom
## Residual deviance: 12.023  on 36  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 8
check_model(quasibin_model_noZero) ## some high collinearity 

Hypothesis 2b:

  • Using a general linear model to plot the residuals of density dependence (Hyp 2a) against silicate to see if survivorship beyond density dependence is being driven by silicate
  • Results: Either with or without the zeros, models are very insignificant. This indicates there is no additional effect of silicate beyond what survivorship is already determined by density dependence.
################################
## calculate residuals: plotting the residuals from previous model from hyp 2a against silicate 
################################

#### with zeroes 
resid(proportion_alive)
##            1            2            3            4            5            6 
##  0.103180661 -0.234672191 -0.148615533  0.136257424 -0.284418902 -0.030158310 
##            7            8            9           10           11           12 
##  0.015331247  0.183737048 -0.067864920 -0.148615533 -0.211427814  0.282074671 
##           13           14           15           16           17           18 
##  0.310081414  0.351384467  0.351384467 -0.562304807  0.039411193 -0.161704446 
##           19           20           21           22           23           24 
##  0.029583750  0.047356116 -0.098284074 -0.004778701 -0.086284870 -0.235905358 
##           25           26           27           28           29           30 
##  0.114471662  0.590736950 -0.144692120  0.134742288  0.183355617 -0.145213635 
##           31           32           33           34           35           36 
##  0.205711886  0.124168843 -0.398227038 -0.409263050  0.009362440  0.058333295 
##           37           38           39           40 
##  0.080041717  0.378211732 -0.409263050  0.052775466
resid_proportionalive <- SettlementDF_Full %>% 
  bind_cols(residuals = resid(proportion_alive)) 

#### without zeroes
resid(proportion_alive_noZero)
##            1            2            3            4            5            6 
##  0.084126854 -0.263781634 -0.192938565  0.128888055 -0.309705928 -0.057576059 
##            7            8            9           10           11           12 
## -0.011329355  0.153672717 -0.082967955 -0.192938565 -0.237380152  0.280461583 
##           13           14           15           16           17           18 
##  0.261308693  0.307061435  0.307061435  0.015344789 -0.193970993 -0.001526542 
##           19           20           21           22           23           24 
##  0.006190189  0.004963123  0.044612696 -0.045721926 -0.086965371 -0.098028525 
##           25           26           27           28           29           30 
##  0.230379314 -0.178309681  0.048519731 -0.044996476 -0.298249473 -0.061358086 
##           31           32           33           34           35           36 
##  0.231644210 -0.050385734  0.062440357  0.042406376  0.251306507 -0.052257043
resid_proportionalive_noZero <- SettlementDF_Full %>% 
  filter(proportion_alive>0) %>% 
  bind_cols(residuals = resid(proportion_alive_noZero)) 

#####################
## models 
#####################

#### with zeroes 

resids_silicateEffect_model <- lm(residuals ~ silicate_avg*Location, data=resid_proportionalive)

anova(resids_silicateEffect_model) ## very insignificant 
## Analysis of Variance Table
## 
## Response: residuals
##                       Df  Sum Sq  Mean Sq F value Pr(>F)
## silicate_avg           1 0.00012 0.000117  0.0019 0.9655
## Location               1 0.00008 0.000078  0.0013 0.9720
## silicate_avg:Location  1 0.07067 0.070671  1.1419 0.2924
## Residuals             36 2.22805 0.061890
summary(resids_silicateEffect_model)
## 
## Call:
## lm(formula = residuals ~ silicate_avg * Location, data = resid_proportionalive)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.53619 -0.17503  0.02615  0.13891  0.52805 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)
## (Intercept)                  0.09099    0.17254   0.527    0.601
## silicate_avg                -0.02570    0.04612  -0.557    0.581
## LocationVarari              -0.24427    0.24701  -0.989    0.329
## silicate_avg:LocationVarari  0.10373    0.09708   1.069    0.292
## 
## Residual standard error: 0.2488 on 36 degrees of freedom
## Multiple R-squared:  0.03083,    Adjusted R-squared:  -0.04994 
## F-statistic: 0.3817 on 3 and 36 DF,  p-value: 0.7668
check_model(resids_silicateEffect_model)

#### without zeroes 

resids_silicateEffect_model_noZero <- lm(residuals ~ silicate_avg*Location, data=resid_proportionalive_noZero)

anova(resids_silicateEffect_model_noZero)
## Analysis of Variance Table
## 
## Response: residuals
##                       Df  Sum Sq  Mean Sq F value Pr(>F)
## silicate_avg           1 0.00093 0.000933  0.0286 0.8668
## Location               1 0.00053 0.000534  0.0163 0.8991
## silicate_avg:Location  1 0.01188 0.011880  0.3638 0.5507
## Residuals             32 1.04502 0.032657
summary(resids_silicateEffect_model_noZero)
## 
## Call:
## lm(formula = residuals ~ silicate_avg * Location, data = resid_proportionalive_noZero)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33149 -0.07936 -0.01351  0.09740  0.31033 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)
## (Intercept)                  0.010988   0.126106   0.087    0.931
## silicate_avg                -0.003151   0.034149  -0.092    0.927
## LocationVarari              -0.092550   0.185599  -0.499    0.621
## silicate_avg:LocationVarari  0.044372   0.073567   0.603    0.551
## 
## Residual standard error: 0.1807 on 32 degrees of freedom
## Multiple R-squared:  0.01261,    Adjusted R-squared:  -0.07996 
## F-statistic: 0.1362 on 3 and 32 DF,  p-value: 0.9377
check_model(resids_silicateEffect_model_noZero)

#####################
## plot 
#####################
#### plot the residuals (this basically says that whatever points are above the best fit line are surviving more than expected due to density dependence because of SGD, anything below the line is surviving less than expected due to SGD)

#### with zeroes 
survivorship_silicateEffect <- resid_proportionalive %>% 
  ggplot(aes(x=silicate_avg, 
             y=residuals)) +
  geom_point(size=4, aes(color=Location)) + 
  scale_color_manual(values=c("darkgoldenrod2", "firebrick4"), 
                    limits=c("Cabral", "Varari")) +
  geom_smooth(method="lm", color="black") + 
  theme_classic() + 
  labs(x = expression(Silicate~(mu*mol~L^-1)), 
       y= "Residuals of Alive Settlers") + 
  theme(axis.text.x=element_text(size=20), 
        axis.text.y=element_text(size=20), 
        axis.title.x=element_text(size=20), 
        axis.title.y=element_text(size=20), 
        legend.text=element_text(size=20),  # Adjust legend text size
        legend.title=element_text(size=20), # Adjust legend title size
        strip.text=element_text(size=20))
  

survivorship_silicateEffect
## `geom_smooth()` using formula = 'y ~ x'

ggsave(here::here("Outputs", "RecruitmentTiles", "Thesis_Figures", "Survivorship_SilicateEffect.jpg"), 
       width=12, height=9)
## `geom_smooth()` using formula = 'y ~ x'
#### without zeroes 
survivorship_silicateEffect_noZero <- resid_proportionalive_noZero %>% 
  ggplot(aes(x=silicate_avg, 
             y=residuals)) +
  geom_point(size=4, aes(color=Location)) + 
  scale_color_manual(values=c("darkgoldenrod2", "firebrick4"), 
                    limits=c("Cabral", "Varari")) +
  geom_smooth(method="lm", color="black") + 
  theme_classic() + 
  labs(x = expression(Silicate~(mu*mol~L^-1)), 
       y= "Residuals of Alive Settlers") + 
  theme(axis.text.x=element_text(size=20), 
        axis.text.y=element_text(size=20), 
        axis.title.x=element_text(size=20), 
        axis.title.y=element_text(size=20), 
        legend.text=element_text(size=20),  # Adjust legend text size
        legend.title=element_text(size=20), # Adjust legend title size
        strip.text=element_text(size=20))
  
survivorship_silicateEffect_noZero
## `geom_smooth()` using formula = 'y ~ x'

ggsave(here::here("Outputs", "RecruitmentTiles", "Thesis_Figures", "Survivorship_SilicateEffect_NoZero.jpg"), 
       width=10, height=9)
## `geom_smooth()` using formula = 'y ~ x'

Hypothesis 3:

  • Using a general linear model (double check correctness)
  • Results:
  • Notes: when remove 0s from percent cover and total alive (though there are quite a few between both sites), you see a pretty strong relationship. The limitation with this hypothesis is that the data previously collected was percent cover data, and we do not have a true density count, so the 0s might not be a true reflection of the actual number of adult corals near the tile. The proportion of alive settlers is also a bit stronger here than just the total number of settlers.
##################
## models 
##################

## total alive 
alive_predict_cover <- lm(percent_cover ~ sum_total_alive*Location, data =SettlementDF_Full)

anova(alive_predict_cover) ## insignificant, though logging makes it WAY better and almost significant (0.06)
## Analysis of Variance Table
## 
## Response: percent_cover
##                          Df  Sum Sq Mean Sq F value Pr(>F)
## sum_total_alive           1   70.53  70.525  2.2513 0.1422
## Location                  1   76.39  76.390  2.4385 0.1271
## sum_total_alive:Location  1   83.55  83.554  2.6672 0.1112
## Residuals                36 1127.77  31.327
summary(alive_predict_cover)
## 
## Call:
## lm(formula = percent_cover ~ sum_total_alive * Location, data = SettlementDF_Full)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.032 -3.120 -1.425  3.139 20.506 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                     2.85949    1.43875   1.987   0.0545 .
## sum_total_alive                 0.03253    0.07097   0.458   0.6495  
## LocationVarari                  0.36096    2.30200   0.157   0.8763  
## sum_total_alive:LocationVarari  0.23165    0.14184   1.633   0.1112  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.597 on 36 degrees of freedom
## Multiple R-squared:  0.1697, Adjusted R-squared:  0.1005 
## F-statistic: 2.452 on 3 and 36 DF,  p-value: 0.07909
## proportion alive 

propalive_predict_cover <- lm(percent_cover ~ proportion_alive*Location, data = SettlementDF_Full)

anova(propalive_predict_cover) ## significant 
## Analysis of Variance Table
## 
## Response: percent_cover
##                           Df  Sum Sq Mean Sq F value  Pr(>F)  
## proportion_alive           1  142.31 142.314  4.6655 0.03751 *
## Location                   1   33.99  33.994  1.1144 0.29816  
## proportion_alive:Location  1   83.81  83.809  2.7475 0.10609  
## Residuals                 36 1098.12  30.503                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(propalive_predict_cover)
## 
## Call:
## lm(formula = percent_cover ~ proportion_alive * Location, data = SettlementDF_Full)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.437  -3.247  -1.031   1.812  17.562 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                       3.5028     2.7569   1.271   0.2120  
## proportion_alive                 -0.6183     4.7914  -0.129   0.8980  
## LocationVarari                    6.9336     3.5295   1.964   0.0572 .
## proportion_alive:LocationVarari -11.4101     6.8836  -1.658   0.1061  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.523 on 36 degrees of freedom
## Multiple R-squared:  0.1915, Adjusted R-squared:  0.1241 
## F-statistic: 2.843 on 3 and 36 DF,  p-value: 0.05132
##################
## models without ZERO
##################

## total alive 
alive_predict_cover_noZero <- lm(percent_cover ~ sum_total_alive*Location, data =SettlementDF_Full %>% 
                                   filter(percent_cover>0))

anova(alive_predict_cover_noZero) ## significant, better without logging 
## Analysis of Variance Table
## 
## Response: percent_cover
##                          Df Sum Sq Mean Sq F value  Pr(>F)  
## sum_total_alive           1  29.56  29.564  0.9619 0.33611  
## Location                  1 193.71 193.712  6.3024 0.01889 *
## sum_total_alive:Location  1  46.96  46.958  1.5278 0.22794  
## Residuals                25 768.41  30.736                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(alive_predict_cover_noZero)
## 
## Call:
## lm(formula = percent_cover ~ sum_total_alive * Location, data = SettlementDF_Full %>% 
##     filter(percent_cover > 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8974 -2.7918 -0.9453  2.4895 17.7228 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                     3.77664    1.61164   2.343   0.0274 *
## sum_total_alive                 0.01777    0.07151   0.248   0.8058  
## LocationVarari                  2.88878    2.78900   1.036   0.3102  
## sum_total_alive:LocationVarari  0.19126    0.15474   1.236   0.2279  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.544 on 25 degrees of freedom
## Multiple R-squared:  0.2602, Adjusted R-squared:  0.1714 
## F-statistic: 2.931 on 3 and 25 DF,  p-value: 0.05315
check_model(alive_predict_cover_noZero)

## proportion alive 

propalive_predict_cover_noZero <- lm(percent_cover ~ proportion_alive*Location, data = SettlementDF_Full %>% 
                                       filter(percent_cover>0))

anova(propalive_predict_cover_noZero) ## very significant 
## Analysis of Variance Table
## 
## Response: percent_cover
##                           Df Sum Sq Mean Sq F value   Pr(>F)   
## proportion_alive           1 233.02 233.017  8.4472 0.007552 **
## Location                   1  64.17  64.172  2.3263 0.139753   
## proportion_alive:Location  1  51.83  51.827  1.8788 0.182651   
## Residuals                 25 689.63  27.585                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(propalive_predict_cover_noZero)
## 
## Call:
## lm(formula = percent_cover ~ proportion_alive * Location, data = SettlementDF_Full %>% 
##     filter(percent_cover > 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7592 -2.5642 -0.9592  2.0417 14.1464 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                        5.917      3.267   1.811   0.0821 .
## proportion_alive                  -3.499      5.405  -0.647   0.5233  
## LocationVarari                     8.249      4.195   1.967   0.0604 .
## proportion_alive:LocationVarari  -11.964      8.728  -1.371   0.1827  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.252 on 25 degrees of freedom
## Multiple R-squared:  0.336,  Adjusted R-squared:  0.2564 
## F-statistic: 4.217 on 3 and 25 DF,  p-value: 0.01523
check_model(propalive_predict_cover_noZero)

##################
## plots  
##################

## total alive 

alive_affect_cover <- SettlementDF_Full %>% 
  filter(percent_cover>0) %>% ## decide whether to keep in or not **
  ggplot(aes(x=sum_total_alive, 
             y=percent_cover)) +
  geom_point(size=4, aes(color=Location)) + 
  scale_color_manual(values=c("darkgoldenrod2", "firebrick4"), 
                    limits=c("Cabral", "Varari")) +
  geom_smooth(method="lm", color="black") + 
  theme_classic() + 
  labs(x = "Total Alive Settlers", 
       y= "Percent Cover of P. acuta adults") + 
  theme(axis.text.x=element_text(size=20), 
        axis.text.y=element_text(size=20), 
        axis.title.x=element_text(size=20), 
        axis.title.y=element_text(size=20), 
        legend.text=element_text(size=20),  # Adjust legend text size
        legend.title=element_text(size=20), # Adjust legend title size
        strip.text=element_text(size=20))

alive_affect_cover
## `geom_smooth()` using formula = 'y ~ x'

ggsave(here::here("Outputs", "RecruitmentTiles", "Thesis_Figures", "Cover_AliveEffect.jpg"), 
       width=12, height=9)
## `geom_smooth()` using formula = 'y ~ x'
## proportion alive 

propalive_affect_cover <- SettlementDF_Full %>% 
  filter(percent_cover>0) %>% 
  ggplot(aes(x=proportion_alive, 
             y=percent_cover)) +
  geom_point(size=4, aes(color=Location)) + 
  scale_color_manual(values=c("darkgoldenrod2", "firebrick4"), 
                    limits=c("Cabral", "Varari")) +
  geom_smooth(method="lm", color="black") + 
  theme_classic() + 
  labs(x = "Proportion of Alive Settlers", 
       y= "Percent Cover of P. acuta adults") + 
  theme(axis.text.x=element_text(size=20), 
        axis.text.y=element_text(size=20), 
        axis.title.x=element_text(size=20), 
        axis.title.y=element_text(size=20), 
        legend.text=element_text(size=20),  # Adjust legend text size
        legend.title=element_text(size=20), # Adjust legend title size
        strip.text=element_text(size=20))

propalive_affect_cover
## `geom_smooth()` using formula = 'y ~ x'

ggsave(here::here("Outputs", "RecruitmentTiles", "Thesis_Figures", "Cover_PropAliveEffect.jpg"), 
       width=12, height=9)
## `geom_smooth()` using formula = 'y ~ x'

Incorporating an SEM?